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Abstract — In this paper we propose a new soft-input soft- 
output equalization algorithm, offering very good perfor- 
mance/complexity tradeoffs. It follows the structure of the BCJR 
algorithm, but dynamically constructs a simplified trellis during 
the forward recursion. In each trellis section, only the M 
states with the strongest forward metric are preserved, similar 
to the M-BCJR algorithm. Unlike the M-BCJR, however, the 
remaining states are not deleted, but rather merged into the 
surviving states. The new algorithm compares favorably with 
the reduced-state BCJR algorithm, offering better performance 
and more flexibility, particularly for systems with higher order 
modulations. 

I. Introduction 

Efficient communication over channels introducing inter- 
symbol interference (ISI) often requires the receiver to perform 
channel equalization. Turbo equalization [1] is a technique in 
which decoding and equalization are performed iteratively, 
similar to turbo-decoding of serially-concatenated convolu- 
tional codes [2]. As depicted in Figure Q the key element of 
the receiver employing this method is a soft-input soft-output 
(SISO) demodulator/equalizer (from now on referred to as just 
an equalizer), accepting a priori likelihoods of coded bits from 
the SISO decoder, and producing their a posteriori likelihoods 
based on the noisy received signal. 

The SISO algorithm that computes the exact values of 
the a posteriori likelihoods is the BCJR algorithm [3]. The 
complexity of a BCJR equalizer is proportional to the number 
of states in the trellis representing the modulation alphabet 
and the ISI, and thus it is exponential in both the length of 
the channel impluse response (CIR) and in the number of bits 
per symbol in the modulator. This can be a serious drawback 
in some scenarios, e.g., transmission at a high data rate over 
a radio channel, where a large signal bandwidth translates 
to a long CIR, and a high spectral efficiency translates to a 
large modulation alphabet. Needed in such cases are alternative 
SISO equalizers with the ability to achieve large complexity 
savings at a cost of small performance degradation. 

There have been two main trends in the design of such 
SISOs. The first one relies on reducing the effective length of 
the channel impulse response, either by linear processing (see, 
e-g-> [4]), or interference cancellation via decision feedback. 
A particularly good algorithm is this category is the reduced- 
state BCJR (RS-BCJR) [5], which performs the cancellation 
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of the final channel taps on a per-survivor basis. Iterative 
decoding with RS-BCJR is very stable, thanks to the high 
quality of the soft outputs, but the receiver cannot use the 
signal power contained in the cancelled part of the CIR. 
Another trend is to adapt "hard-output" sequential algorithms 
[6] to produce soft outputs [7]. Examples in this category are 
the M-BCJR and T-BCJR algorithms [8], based on the M- 
and T-algorithms, and the LISS algorithm [9] based on list 
sequential decoding. These algorithms have no problem using 
the signal energy from the whole CIR, and offer much more 
flexibility in choosing the desired complexity. However, their 
reliance on ignoring unpromising paths in the trellis or tree 
causes a bias in the soft output (there are more explored paths 
with one value of a particular input bit than another), which 
negatively affects the convergence of iterative decoding. 

In this paper we present a new SISO equalization algorithm, 
inspired by both the M-BCJR and RS-BCJR, which shares 
many of their advantages, but few of their weaknesses. We 
call this algorithm the M*-BCJR algorithm, since it resembles 
the M-BCJR in preserving only a fixed number of trellis states 
with the largest forward metric. Instead of deleting the excess 
states, however, the M*-BCJR dynamically merges them with 
the surviving states — a process that shares some similarity 
to the static state merging done on a per-survivor basis by the 
RS-BCJR. For the sake of simpler notation, we present the 
operation of all BCJR-based algorithms, including the M*- 
BCJR, in the probability domain. Each of them, however, 
can be implemented in the log domain for better numerical 
stability. 

The rest of the paper is structured as follows. Section 2 
describes the communication system and the task of the SISO 
equalizer and introduces the notation. Section 3 reviews the 
structure of the BCJR, M-BCJR, and RS-BCJR algorithms, 
helping us to introduce the M*-BCJR in Section 4. Section 
5 presents simulation results, and conclusions are given in 
Section 6. 

II. Communication system 

A communication system with turbo equalization is depicted 
in Figure ^ The information bits are first arranged into 
blocks and encoded with a convolutional code. The blocks 
of coded bits are permuted using an interleaver and mapped 
onto a sequence of complex symbols by the modulator. (In 
general, the modulator can have memory, but for simplicity 
we will assume a memoryless mapper.) The channel acts as a 
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Fig. 1. Communication system with turbo equalization. 
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Fig. 2. Part of the system to be "soft-inverted" by the SISO equalizer. 



discrete-time finite impulse response (FIR) filter introducing 
ISI, the output of which is further corrupted by additive white 
Gaussian noise (AWGN). We assume the receiver knows the 
ISI channel coefficients and the noise variance, and it attempts 
to recover the information bits by iteratively performing SISO 
equalization and decoding. 

The part of the system significant from the point of view 
of the equalizer is shown in Figure [2] Let a = (a 1; a 2 , a^,) 
denote a sequence of LK bits entering the modulator, arranged 
into L groups a.; = (aj,a 2 , ...,af) of K bits. Each K- 
tuple SLi selects a complex-valued output symbol x\ from a 
constellation of size 2 K to be transmitted. The sequence of 
symbols y = (yi, y2, Vl+s) obtained at the receiver is 
modeled as 



y, 



3=0 



+ Hi, 



(1) 



where S is the memory of the channel, hj, j = 0,1,..., S, 
are the channel coefficients, and rii, i = 1,2, ...,L + S, are 
i.i.d. zero-mean complex-valued Gaussian random variables 
with variance a 2 per complex dimension. Equation Q as- 
sumes that Xi is zero outside i = 1,2, ...,L. 

The SISO equalizer for the above channel takes the received 
symbols y and the a priori log-likelihood ratios L a (a^) for 
each bit af, defined as 

P(4 



L a (af) =log- 



-1) 



(2) 



(3) 



p(4 = -iy 

and outputs the a posteriori L-values L{aJ[) 

l ' S P(a.? = -l|y) 
The values actually fed to the SISO decoder are extrinsic L- 
values, computed as L e (a^) = L(a^) — L a (a^). 

Let A(a) denote the joint probability that a was transmitted 
and y was received. Then (0 can be expressed as 

t, k\ i Sa:af=+l A ( a ) 
Ea:o^=-l A ( a ) 



where the summations are performed over all a consistent with 
= ±1. Furthermore, 

l+S 1 s 
A(a)=P(a) J] ^^exp^-lh-^^^-H 2 ), 



=1 V2^ 



(5) 

where hj, j = 0, 1, S, and a 2 are assumed known at the 
receiver and P(&) is obtained from L a as 



L K 



i=l k=l 



with 



exp(±L a (af)) 



P{a$ = ±1) = 



(6) 



(7) 



l + exp(± J L a (a l fc ))' 

Since the number of paths involved in the summations of @ 
is extrememly large for realistic values of K and L, a practical 
algorithm seeks to simplify or approximate this calcualtion. 

III. SISO EQUALIZATION 

A. The BCJR algorithm 

The classical algorithm for efficiently computing © by 
exploiting the trellis structure of the set of all paths is the 
BCJR algorithm [3]. By defining the state s$ at time i as the 
past S input symbol AT-tuples a.;, si — (af_i, aj_s), and 
a branch metric j(si, a») as 

1 1 5 
7(si,ai) = P(a l )-===exp( ^\\n - V/ij-a^H 2 ), 

V 27TCT rl 



the path metric can be factored into 

L+S 

A(a) = Y[ l(si,a t ). 



3=0 



(8) 



(9) 



For indices outside the range i = 1, L, the variables a; are 
regarded as empty sequences <j> with P(a^ = <f) = 1. 

For every trellis branch bi — (sj, aj, s-i+i) starting in state 
Si, labeled by input bits a;, and ending in state Si+i, the BCJR 
algorithm computes the sum of the path metrics A(a) over all 
paths passing through this branch as 

^A(a) =a(s i )7(s i ,a i )/3(s i+ i). (10) 

a:hi 

The computation of the forward state metrics a(si) is per- 
formed in the forward recursion for i = 1,2,..., L + S — 1: 

a(s l+1 )= ^2 a(s l )j(s l ,a l ), (11) 

6; = (s;,a;,s i + i) 

with the initial state value a(si) = 1. Similarly, the backward 
recursion computes the backward state metrics j3(si) for i = 
L + S,L + S- 1,...,2: 

/?(*»)= 7(«i,a<)/3(«i+i) s (12) 

6i = (si,ai,s i+ i) 



with the terminal state value /3(sl+s+i) = 1. With all as, 
/3's, and 7's computed, the summations over paths in @ can 
be replaced by the summations over branches, 



L{a1) = log 



T,b t :a k =+i a ( s i)l( a h a,)/3(s l+1 ) 



(13) 



Eb,:a*=-i a(s z )-f(s u a 4 )/3(s 2+ i) ' 

The completion phase, in which Jl 3i is evaluated for every 
a*, concludes the algorithm. 

The complexity of the BCJR equalizer is proportional to 
the number of trellis states, 2 KS . The following subsections 
describe the operation of the RS-BCJR [5] and M-BCJR [8] 
algorithms, which preserve the general structure of the BCJR, 
but instead operate on dynamically built simplified trellises 
with a number of states controlled via a parameter. In the 
original form of both algorithms, the construction of this 
simplified trellis occurs during the forward recursion and is 
based on the values of the forward state metrics, while the 
backward recursion and the completion phase just reuse the 
same trellis. 

B. The RS-BCJR algorithm 

The way we will describe the operation of the RS-BCJR 
algorithm is slightly different from the presentation in [5], but 
is in fact equivalent. 

Let us consider two states in the trellis, 

St = (sh-i, ...,a i _ S ',a i _ jS /_i, ...,a 4 _ s ), (14) 
s 't = ( a »-i> -A-S'.aU'-i. -j^-s)) (I 5 ) 

differing only in the last S — 5' binary if -tuples. Furthermore, 
consider two partial paths beginning in states Si and s[ and 
corresponding to the same partial input sequence n = 
(aj, aj,). Both paths are guaranteed to merge after 5 — S' 
time indices, and hence their partial path metrics are 

i+S-S'-l L 

A(s 4 ,a^ L] )= Yl 7(sj>aj) II 7(^>a 3 ), (16) 

3=t j=i+S-S' 
i+S-S'-l L 

A(si» a [i,i]) = II 7(4> a j) II "t{sj,*-j)- (17) 
3=1 j=i+S-S> 

Additionally, close examination of reveals that the differ- 
ence between "f(sj, a?) and "f(Sj, aj) for j — i, i+S— 5'— 1 
is not large. Hence, the difference between X(si,a) and 
X(s' i ,a), for aun, is also not large. 

The RS-BCJR equalizer relies on the above observation and, 
for some predefined 5', declares states differing only in the 
last 5 — S' binary if -tuples indistinguishable. Every such set 
of states is subsequently reduced to a single state, by selecting 
the state with the highest forward metric and merging all 
remaining states into it. Here, we define merging of the state s[ 
into Si as updating the forward metric a(si) := a(si) + a(s^), 
redirecting all trellis branches ending at s' ; into s,, and deleting 

from the trellis. This reduction is performed during the 
forward recursion, and the 7's for the paths that originate from 
removed states need never be computed. The trellis that results 
has only 2 KS states, compared to 2 KS in the original trellis. 



The same trellis is then reused in the backward recursion and 
the completion stage. 

The RS-BCJR equalizer is particularly effective when the 
final coefficients of the ISI channel are small in magnitude. 
Furthermore, the reduced-state trellis retains the same branch- 
to-state ratio (branch density) and has the same number of 
branches with a* = +1 and a\ = — 1 for any i and k 
— properties that ensure a high quality for the soft outputs 
and good convergence of iterative decoding. Unfortunately, 
the RS-BCJR algorithm cannot use the signal power in the 
final 5—5' channel taps, effectively reducing the minimum 
Euclidean distance between paths. Moreover, the number of 
surviving states can only be set to a power of 2 , which 
could be a problem for large K (e.g., for a system with 
16QAM modulation, equalization using 16 states could result 
in poor performance, while 256 states could exceed acceptable 
complexity). 

C. The M-BCJR algorithm 

The M-BCJR algorithm is based on the M-algorithm [6], 
originally designed for the problem of maximum likelihood 
sequence estimation. The M-algorithm keeps track only of the 
M most likely paths at the same depth, throwing away any 
excess paths. In the M-BCJR equalizer this idea is applied to 
the trellis states during the forward recursion. At every level 
i, when all a(si) have been computed, the M states with the 
largest forward metrics are retained, and all remaining states 
are deleted from the trellis (together with all the branches that 
lead to or depart from them). The same trellis is then reused 
in the backward recursion and completion phase. 

In [8] it was shown that the M-BCJR algorithm performs 
well when the state reduction ratio 2 KS /M is not very large. 
Also, unlike the RS-BCJR algorithm, it can use the power 
from all the channel taps. For small M, however, the reduced 
trellis is very sparse, i.e., the branch-to-state ratio is much 
smaller than in the full trellis and there is often a disproportion 
between the number of branches labeled with = +1 and 
a\ — — 1 for any i and k. These factors reduce the quality 
of the soft outputs and the convergence performance and 
may require an alternative way of computing the a posteriori 
likelihoods (like the Bayesian estimation approach presented 
in [10]). Finally, the M-BCJR algorithm requires performing 
a partial sort (finding the M largest elements out of M2 K ) at 
every trellis section, which increases the complexity per state. 

IV. The M*-BCJR algorithm 

In this section we demonstrate how the concept of state 
merging present in the RS-BCJR equalizer can be used to 
enhance the performance of the M-BCJR algorithm. We call 
the resulting algortihm the M*-BCJR algorithm. 

During the forward recursion the M*-BCJR algorithm re- 
tains a maximum of M states for any time index i. Unlike the 
M-BCJR algorithm, however, the excess states are not deleted, 
but merely merged into some of the surviving states. This 
means that none of the branches seen so far are deleted from 
the trellis, but they are just redirected into a more likely state. 
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Fig. 3. Trellis section a) before and b) after merging an excess state into 
a surviving state Si. 

The forward recursion of the algorithm can be described as 
follows: 

1) Set i := 1. For the initial trellis state si, set a(si) := 1. 
Also, fix the set of states surviving at depth 1 to be 

Si := si. 

2) Initialize the set of surviving states at depth i + 1 to an 
empty set, Si + i = <j). 

3) For every state Si in the set Si, and every branch 
b = (si, &i, Si+i) originating from that state, compute 
the metric 7(si,ai), and add s^+i to the set Si+%. 

4) For every state Si+i in Si+i compute the forward state 
metric as a sum of a(si) r y(si, a^) over all branches b = 
(si,a.i, Si+i) visited in step 3 that end in Sj+i. 

5) If the number of states in Si+i is no more than M, 
proceed to step 8. Otherwise continue with step 6. 

6) Determine the M states in Si+\ with the largest value 
of the forward state metric. Remove all remaining states 
from Si+i and put them in a temporary set S' i+l . 

7) Go over all states s' i+1 in the set S' i+1 and perform the 
following tasks for each of them: 

- Find a state s.;+i in Si+i that differs from s' i+1 by 
the least number of final if-tuples a.j, 

- Redirect all branches ending in s' i+1 to S;+i. 

- Add a(s' i+1 ) to the metric a(s i+1 ). 

- Delete s' i+1 from the set S' i+1 . 

8) Increment i by 1. If i < L+S— 1, go to step 2. Otherwise 
the forward recursion is finished. 

The merging of into Si in step 7 is also illustrated in 
Figure [3] The backward recursion and the completion phase 
are subsequently performed only over states remaining in the 
sets Si and only over visited branches (i.e., branches for which 
the metrics 7 were calculated in step 3). 

Just as for the M-BCJR, the M*-BCJR algorithm can use 
the power from all channel taps and offers full freedom in 
choosing the number of surviving states M. At the same 
time, the M*-BCJR never deletes visited branches, and hence 
it retains the branch density of the full trellis and avoids a 
disproportion between the number of branches labeled with 
a\ = +1 and = —1. As a result, the soft outputs generated 
by the M*-BCJR equalizer ensure good convergence of the 
iterative receiver. Complexity-wise, the algorithm requires 
some additional processing per state (due to step 7) and 



TABLE I 

Simulated turbo-equalization scenarios. 





Scenario 1 


Scenario 2 


Outer code 


CC(2,1,5) 


CC(2,1,5) 


Modulation 


BPSK 


16QAM 


Channel memory S 


4 


2 


CIR 


{V0.45, V0.25, 


{1,1,1} 


{h , ...,h s } 


VoZTE, Vol, VoM} 




BCJR states 


16 


256 


Interleaver size 


1024 


4096 


No. of iterations 


6 


6 



some additional memory per branch (the ending state must 
be remembered for each branch). However, if we regard the 
calculation of the branch metrics 7 as the dominant operation, 
the complexities of the M-BCJR, RS-BCJR, and M*-BCJR 
equalizers are the same for fixed M = 2 KS . 

V. Simulation results 

To evaluate the performance of the M*-BCJR equalizer, we 
considered two turbo-equalization systems. Both systems used 
a recursive, memory 5, rate 1/2 terminated convolutional code 
as an outer code. The first system used BPSK modulation and 
a 5-tap channel (maximum 16 states), and a block of 507 
information bits (size 1024 DRP [11] interleaver). The second 
system used 16QAM modulation, but only a 3 -tap channel 
(maximum 256 states), and a block of 2043 information bits 
(size 4096 DRP interleaver). The remaining parameters and 
the channel impulse responses are summarized in Table U 

Both systems were simulated with the M*-BCJR and RS- 
BCJR equalizers, for several values of M and S'. In each 
case we allowed the receiver to perform 6 iterations. The 
bit error rates P e for a range of Eb/N (average energy 
per bit over noise spectral density) are plotted in Figure 
[4] To better illustrate the complexity-performance tradeoffs 
achievable with both algorithms, we also plotted the number of 
states M or 2 KS against the Et,/N needed to achieve certain 
P e (10~ 4 for system 1 and 10~ 3 for system 2) in Figure [5] 

The simulations demonstrate the superior performance of 
the M*-BCJR equalizer. In scenario 1, the M*-BCJR equalizer 
with 3 states outperforms the RS-BCJR with 8 states by 0.1 
dB for P e below 10~ 4 . When both algorithms use 4 states, 
the M*-BCJR equalizer offers a 0.7 dB gain compared to the 
RS-BCJR. In scenario 2, the M*-BCJR with 16 states achieves 
almost a 3 dB gain over the RS-BCJR with the same number 
of states. 

VI. Summary 

We have examined the problem of complexity reduciton in 
turbo equalization for systems with large constellation sizes 
and/or long channel impulse responses. We have defined the 
operation of merging one state into another and used it to 
give an alternative interpretation of the RS-BCJR algorithm. 
Finally we modified the M-BCJR algorithm, replacing the 




deletion of excess states by the merging of these states into the 
surviving states. The resulting algorithm, called the M*-BCJR 
algorithm, was shown to generate reduced-complexity trellises 
more suitable for SISO equalization than those obtained by 
the RS-BCJR and M-BCJR algorithms. Simulation results 
demonstrated very good performance for turbo-equalization 
systems employing the M*-BCJR, exceeding that of the RS- 
BCJR even with much smaller complexities. 
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